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Abstract 

Forecast combination has been proven to be a very important technique to obtain 
accurate predictions. In many applications, forecast errors exhibit heavy tail behaviors 
for various reasons. Unfortunately, to our knowledge, little has been done to deal with 
forecast combination for such situations. The familiar forecast combination methods 
such as simple average, least squares regression, or those based on variance-covariance 
of the forecasts, may perform very poorly. In this paper, we propose two nonparametric 
forecast combination methods to address the problem. One is specially proposed for 
the situations that the forecast errors are strongly believed to have heavy tails that 
can be modeled by a scaled Student’s t-distribution; the other is designed for relatively 
more general situations when there is a lack of strong or consistent evidence on the tail 
behaviors of the forecast errors due to shortage of data and/or evolving data generating 
process. Adaptive risk bounds of both methods are developed. Simulations and a real 
example show superior performance of the new methods. 

Keywords: Forecast Combination, Heavy Tails, Risk Bounds, Robust Forecasting, 
Time Series Models 


* Corresponding author 


1 



1 Introduction 


When multiple forecasts are available for a target variable, well designed forecast combi¬ 
nation methods can often outperform the best individual forecaster, as demonstrated in 
the literature of applications of forecast combinations in helds such as tourism, wind power 
generation, hnance and economics in the last hfty years. 

Many combination methods have been proposed from different perspectives since the 
seminal work of forecast combination by Bates & Granger (1969). See the discussions and 
summaries in Clemen (1989), Newbold & Harvey (2002) and Timmermann (2006) for key 
developments and many references. More recently, Lahiri et. al (2013) provided theoreti¬ 
cal and numerical comparisons between adaptive and simple forecast combination methods. 
However, to our knowledge, few studies have proposed/discussed forecast combination meth¬ 
ods that target at cases where the forecast errors exhibit heavy tail behaviors. In this paper, 
heavy tailed distributions may sometimes loosely refer to distributions with tails heavier 
than Gaussian distributions, although specihc choices such as f-distributions will be studied. 
In many such situations, the familiar forecast combination methods such as simple average, 
least squares regression with or without constraints, or those based on variance-covariance 
of the forecasts, may perform very poorly (some numerical examples are provided in sections 
4 and 5 in this paper). As a matter of fact, many important variables in hnance, economics 
and other areas are believed to have heavy tails. For example, Marinelli et. al (2001) dis¬ 
cussed the evidences of heavy tailed distributions to model the exchange rates, and Harvey 
(2013) modeled the U.S. GDP with a Student’s t distribution with a low degrees of freedom. 
Therefore, it is practically very useful to design forecast combination methods to handle 
heavy tailed situations. 

In this paper, we propose two forecast combination methods following the spirit of the 
AFTER strategy by Yang (2004). One is specially designed for situations when there is 
strong evidence that the forecast errors are heavy-tailed and can be modeled by a scaled 
Student’s f-distribution. The other one is designed for more general uses. For the former 
case, we assume that the forecast errors follow a scaled Student’s f-distribution with possibly 
unknown scaled parameter and degrees of freedom. For situations when the identihcation 
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of the heaviness of tails of the forecast errors is not feasible, normal, double-exponential 
and scaled Student’s t-distributions are considered at the same time as candidates for the 
distribution form of the forecast errors. In either case, no parametric assumptions are needed 
on the relationships of the candidate forecasts. 

Technically, if the forecast errors are assumed to follow a normal or a double-exponential 
distribution with zero mean, then the conditional probability density functions used in the 
combining process of the AFTER scheme can be estimated relatively easily for all the candi¬ 
date forecasters because the estimation of the conditional scale parameters is straightforward. 
See, e.g., Zou & Yang (2004) and Wei & Yang (2012), for more details. However, this is not 
thue if a scaled Student’s f-distribution is assumed. Among the literature discussing the 
maximum likelihood parameter estimation in Student’s t-regressions in the last few decades, 
Fernandez & Steel (1999) and Fonseca et. al (2008) provided comprehensive summaries of 
the convergence properties of the parameter estimations in different situations. Both of them 
showed that the estimation of the degrees of freedom and the scale parameter simultane¬ 
ously in a scaled Student’s t-regression models suffers from monotonic likelihood because the 
likelihood goes to inhnity as the scale parameter goes to zero if the degrees of freedom v 
is not large enough. To deal with this difficulty, methods other than maximum likelihood 
estimation have been proposed in the literature. For example, one may £x the degrees of 
freedom hrst then estimate the scale parameter using method of moments or other tools (see, 
e.g., Kan & Zhou, 2003). 

In this paper, we follow a two-step procedure to estimate the density function given a 
forecast error sequence. First, estimate the scale parameter for each element in a given 
candidate pool of degrees of freedom. Note that each combination of the degrees of freedom 
and the scale parameter leads to a different estimate of the density function. Second, the 
weight of a density estimate is assigned from its relative historical performance. The hnal 
density estimate is a weighted mean of all the candidate density estimates. More details about 
this procedure, including how to determine the pool of candidate estimates, are available in 
section 2. There are three major advantages of this procedure: hrst, because a pool of 
degrees of freedom (rather than a single candidate) is considered, it reduces the potential 
risk of picking a degrees of freedom parameter that is far from the truth. Second, the 
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likelihood that each candidate density estimate is the best is pnrely decided by data. Third, 
the calcnlation of the combined estimator is easy and fast. 

It is worth pointing ont that some popnlar combination methods in the literatnre make as- 
snmptions on the distribntions of forecast errors that do not necessarily exclnde heavy tailed 
behaviors. For example, methods that are based on the estimation of variance-covariance of 
forecasters reqnire the existence of variances. Regression based forecast combination meth¬ 
ods (see, e.g.. Granger & Ramanathan, 1984) assnme the existence of certain moments of 
the forecast errors. However, to onr knowledge, these methods are not really designed to 
handle heavy-tailed errors and are not expected to work well for snch sitnations. 

Prior to onr work, efforts have been made to deal with error distribntions that have 
tails heavier than normal by adaptive forecast combination methods. For example, Sancetta 
(2010) assnmed that the tails of the target variables are no heavier than exponential decays, 
which restrict the heaviness of the tails of the forecast errors. Wei &: Yang (2012) designed 
a method for errors heavier than the normal distribntions bnt not heavier than the donble- 
exponential distribntions. However, none of these methods can deal with forecast errors with 
tails as heavy as that of Stndent’s f-distribntions. The new AFTER methods in this paper 
will be shown to handle snch sitnations. 

The plan of the paper is as follows: section 2 introdnces the forecast combination method 
designed for heavy-tailed error distribntions; in section 3, a more general combination method 
is proposed. Simulations are presented in section 4, and section 5 provides a real data 
example. Section 6 includes a brief concluding discussion. The proofs of the theoretical 
results are in the appendix. 

2 t-AFTER 

In this section, we propose a forecast combination method when there is strong evidence 
that the random errors in the data-generating process are heavy-tailed and can be modeled 
by a scaled Student’s f-distribution. 
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2.1 Problem Setting 

Suppose at each time period i > 1, there are J forecasters available for predicting yi and 
the forecast combination starts at io ^ 1- Note that some combination methods may require 
*0 to be large enough, e.g., 10, to give reasonably accurate combinations. Let yij be the 
forecast of y^ from the j-th forecaster. Let := {yi^i, • • • , be the vector of candidate 
forecasts for y^ made at time point i — 1. 

Suppose yi := rrii + Si, where rrii is the conditional mean of yt given all available in¬ 
formation prior to observing yi and e* is the random error at time i. Assume e* is from a 
distribution with probability density function {pdf) -^h{j:), where Sj is the scale parame¬ 
ter that depends on the data before observing yi and h{-) is a pdf with mean 0 and scale 
parameter 1. 

Let Wi := (hLj,i, • • • , Wi^j) be a vector of combination weights of Yi. It is assumed that 
J2j=i ^i,j — 1 ^i,j — 0 ^ * 0 ) 1 < j < Let WiQ = {wi, • • • , wj) be the initial 

weight vector. The combined forecast for yi from a combination method is: 

yi={t.Wi), (1) 

where (a, b) stands for the inner-product of vectors a and b. Specihcally, when needed, we 
use a superscript 5 on each Wi to denote the combination weights that correspond to the 
method 5. For example, in the following sections, and stand for the combination 
weights from the L 2 - and Li-AFTER methods, respectively. 

2.2 The Existing AFTER Methods 

As one recent method of adaptive forecast combination, the general scheme of adaptive 
forecast combination via exponential re-weighting (AFTER) was proposed by Yang (2004). 
It has been applied and studied in e.g., Ean et. al (2008), Inoue & Kilian (2008), Sanchez 
(2008), Altavilla & Grauwe (2010), and Lahiri et. al (2013) and Zhang et. al (2013) handled 
the case that the variable to be predicted is categorical. 

In the general AFTER formulation, the relative cumulative predictive accuracies of the 
forecasters are used to decide their combining weights. Let ||x||i := 1^*1 Le the /i-norm 

of vector x = (xi, • • • , Xn). 
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The general form of Wi for the AFTER approach is: 


where li_i = (/*_!,i, 


Wi = 

^ 2 111 II ? 

I 


• , and for any 1 < j < J, 


( 2 ) 



where Sj/j is an estimate of Sj' from the j-th forecaster at time point i' — 1. 

Below, the most commonly used AFTER procedures, the L 2 -AFTER from Zou & Yang 
(2004) and the Li-AFTER from Wei & Yang (2012), are briefly introduced. 

L 2 -after When the random errors in the data generating process follow a normal dis¬ 
tribution or a distribution close to a normal distribution, the L 2 -AFTER is both theoretically 
and empirically competitive in providing combined forecasts that perform at least as well as 
any individual forecaster in any performance evaluation period plus a small penalty. Let /at 
be the pdf of Y(0,1). To get hrst use as the h in (3), then plug the new lj_i into 
(2). The Sij used in the L 2 -AFTER, denoted as dij, is the sample standard deviation of 
{Vi' ~ assuming the random errors are independent and identically distributed. 

Li- AFTER Let f^E be the pdf of a double-exponential distribution with scale parameter 
1 and location parameter 0. To get one can follow the same procedure for but 

use foE as the h in (3). The Sjj used in the Li-AFTER, denoted as dij^ is the mean of 
{IVi' ~ yi',j\}i~2i- The Li-AFTER method was designed for robust combination when the 
random errors have occasional outliers. See Wei & Yang (2012) for details. 


2.3 The i-AFTER Methods 

Since the estimation of the degrees of freedom and the scale parameter simultaneously in 
a scaled Student’s t-regression setting suffers from certain theoretical difficulties as men¬ 
tioned in the introduction, we use a different strategy in this paper. Specihcally, we take an 
estimation procedure that has two steps: 

1. We decide a pool of candidate degrees of freedom with size K. The elements in the 
pool are considered to be close to the degrees of freedom of the Students’ f-distribution 
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that describes the random errors well. For each element in the set, we assume it is the 
true degrees of freedom to estimate the related scale parameter. So we have K sets of 
estimate for the degrees of freedom and scale parameter pair. 

2. For each of the K sets of estimate, we hnd its probability to be the true one based on 
the relative historical performances. 


This two-step procedure is used in the f-AFTER method for forecast combination when 
the random errors have heavy tails that can be described well by a Students’ t-distribution. 

Let n := (z/i, • • ■ , vk) be a set of degrees of freedom for Student’s t-distributions. The 
choice of fl will be discussed later in this subsection. Let > 0 and XIa—i '^j=i ~ 

1) be the initial combination weight of the forecaster j under the degrees of freedom z/^. 

Let the combining weight of Yi from a t-AFTER method be and the combined 
forecast be yf\ Then, and are obtained via the following steps: 


1. Estimate (e.g., by MLE) s* for each G and for each candidate forecaster. The 
estimate for s* from the j-th forecaster given z/^ is denoted as 

2. Calculate Wf* and yf^: 


= 


1^* 

h-l 




lAt I 

h-ll 




(4) 


where ifd^ = • • • , j) and for 1 < j < J and any f > fo + 1, 


lAt 


lAt _ 
h-ij - 


K 


jAt 


with 






k=l 


2—1 

n 

i'>iQ 




-ft 


Vi' Vi'J 


^i',j,k 




(5) 


where /z(-|z/) is the pdf of a Student’s t-distribution with degrees of freedom z/. 


It is assumed that the elements in hi are natural numbers for the sake of convenience. 
In general, when no specihc information is available to estimate the size of candidate de¬ 
grees of freedom efficiently, one can start with a large but relatively sparse pool (say, 
{1, 3, 5, 8,12,15, 20, 30}) and then may narrow it down based on the performances on some 
training data sets. When there is strong evidence that the tails of the forecast errors are 
heavy, the size of hi can be relatively small, say no more than 3 or 5. In this situation, from 
our experiences, hi = {1, 3} or {1, 3, 5} works well. 
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Obviously, when the random errors in the true model follow a scaled Student’s t-distribution 
with a known degrees of freedom u, then O := {z/}. Then (5) can be simplihed into: 



where wj is the initial weight of the j-th forecaster and Sij is an estimate of s* from the j-th 
forecaster using all information at and before time point i — 1 when the true u is known. 


2.4 Risk Bounds of the AFTER 

To avoid potential redundancy, we hrst give a risk bound on the t-AFTER assuming u is 
known. A more general theorem that treats u (and even the form of error distribution) as 
unknown will be given in section 3. 


2.4.1 Conditions 

Condition 1. There exists a constant r > 0 such that for any i > 


Pr( sup \yij -mi\/si< y/r) = 1. 
i<j<J 

Condition 2. These exists a constant > 0 such that for any i > io and 1 < j < J: 

Condition 2'. These exists a constant 0 < .^( < 1 such that for any i > io and 1 < j < 

Condition 1 holds when the forecast errors are bounded, which is true in many real 
applications, although it excludes some time series models such as AR(1). It is required 
for the development of the theorems in this paper. As you can see that this condition does 
not require i/i to be bounded so that it allows large outliers to occur in the random errors. 
When the conditional mean of Ui is known to stay in certain range and the related forecasts 
are relatively restricted, the condition holds. See section 3.1 of Wei & Yang (2012) for more 
discussions on this condition. 
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Condition 2 generally reqnires that the estimates of the scale parameters are not too small 
compared to the truth. Condition 2' requires that the estimates of the scale parameters are 
not too far from the truth in both directions. 


2.4.2 Risk Bounds for the t-AFTER with a Known v 


Assume the true forecast errors follow a scaled Student’s f-distribution with a known degrees 
of freedom v. Let Oi and Sj be the conditional standard deviation and scale parameter, 
respectively, of at time point i and let Sjj be an estimator of Sj from the j-th forecaster. 
Let qi = ^ \ i^'^ be the actual conditional error density function at time point i 


and qf^ = 


rAt 1 


yi,j Vi I 


z/j, where is dehned in (4). So, q^ * is the mixture 

/ 


estimator of q^ from the f-AFTER procedure. Let D(f\\g) := / /log — be the Kullback- 

J 9 

Leibler divergence between two density functions / and g. So, E(^D{qi\\q^ ‘)) is a measure of 
the performances of qf^ as an estimate of g* under the Kullback-Leibler divergence at time 
point i. 


Theorem 1. If the random errors are from a scaled Student’s t-distribution with degrees of 
freedom v and Condition 2 holds, then: 

H io+n 

n . i<j<J 


*=*0 + 1 


log ^ ^ ^ ^ ^ ~ 

n n ^ 23 ] n ^ 

*=* 0+1 *=* 0+1 


Further, if v is strictly larger than 2 and Conditions 1 and 2! hold, then 

* 0 +** / _ ^ \2 TJ * 0 +** 

n 


L y g *™- ) < c iiit 

n , ^ af i<i<J 

*=*0 + 1 ‘ 


71 71 < ^ rr^ 71 < ^ 


i=iQ-\-l * 2=2 q + 1 ^ 

In the above, C, and are constants. Bi and B 3 depend on and respectively. 

B 2 is a function of v and C depends on r and // 


Remarks. 

1. When only Condition 2 is satished. Theorem 1 shows that the cumulative distance 
between the true densities and their estimators from the f-AFTER is upper bounded 
by the cumulative (standardized) forecast errors of the best candidate forecaster plus a 
penalty that has two parts: squared relative estimation errors of the scale parameters 











and logarithm of the initial weights. This risk bonnd is obtained withont assnming 
the existence of variances of the random errors and Sij /Sj is only reqnired to be lower- 
bonnded. 

2. When u is assnmed to be strictly larger than 2 and both Conditions 1 and 2' are satisfied, 
Theorem 1 shows that the cnmnlative forecast errors have the same convergence rate 
of the cnmnlative forecast errors of the best candidate forecaster plus a penalty that 
depends on the initial weights and efficiency of scale parameters estimation. The risk 
bounds hold even if the the distribution of random errors have tails as heavy as t^. 

3. If there is no prior information to decide the Wj's in (6), then equal initial weights could 
be applied. That is, Wj = 1/J for all j. In this case, it is easy to see that the number 
of candidate forecasters plays a role in the penalty. When the candidate pool is large, 
some preliminary analysis should be done to eliminate the signihcantly less competitive 
ones before applying the t-AFTER. 

3 ^-AFTER 

In section 2, the theoretical risk bounds of the combined forecasts from the f-AFTER are 
provided when the random errors are known to have Student’s f-distributions. However, the 
error distribution is typically unknown. 

In this section, we propose a forecast combination method, ^f-AFTER, for situations 
when there is a lack of strong or consistent evidence on the tail behaviors of the forecast 
errors due to shortage of data and/or evolving data-generating process. A theorem that 
allows the random errors to be from one of the three popular distribution families (normal, 
double-exponential, and scaled Student’s t) is provided to characterize the performance of 
the ^f-AFTER. 
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3.1 The ^f-AFTER Method 


'' A A A 

Let the combining weight of Yi from the gf-AFTER be For any i > io? ^ 

associated combined forecast are: 




1 


where ■■■ , and for l<j<J, 




and the 


(7) 

( 8 ) 


where ifliy are from the L 2 -, Li- and t-AFTERs, respectively and Ci and 

C 2 are non-negative constants that control the relative importances of the L 2 -., Li- and t- 
AFTERs in the gf-AFTER. For instance, Ci and C 2 can be small when one has evidence that 
suggests the random errors are likely to be normally distributed. 


3.2 Conditions 

Condition 3. Suppose the random errors have zero mean and are from one of the three 
families (normal, double exponential, and scaled Student’s t), and there exists a constant 
0 < ^2 < 1 such that for any i > io; with probability 1, we have 

^ Si 1 
6 < - < 

Si ^2 

where s* the actual conditional scale parameter at time point i and Si refers to any estimate 
of Si used in the gf-AFTER. 

This condition requires all the estimates of the scale parameters stay in a reasonable 
range around the true values. For the j-th candidate forecaster, Si is aij when associated 
with normal errors, is dij when associated with the double exponential, and is Sij^k when 
associated with the scaled Student’s t with degrees of freedom z/^, where djj, dij, Sij^k and 
Uk are dehned in section 2.2 and 2.3. 

Condition 4. When the random errors in the true model follow a scaled Student’s t- 
distribution with degrees of freedom u, assume there exist positive constants u, X and 9 such 
that, 

^ < min(z/fc, z/) — 2 < P, max |z/fc — z/| < A. 
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3.3 Risk Bounds for the ^f-AFTER 

Let and be the initial combination weights of the forecaster j in the L 2 - and Li- 
AFTERs respectively and be the initial combination weight of the j-th forecaster under 
the degrees of freedom in the t-AFTER. 

" iilii = ligf • Fij,. is defined in 

(5) and is defined in (8). So, and are the weights of the density 

estimates under normal, double-exponential and scaled Student’s t with degrees of freedom 
Uk in the AFTER procedure at time point i — 1 from the j-th forecast, respectively. Let 
G = C 2 where Ci and C 2 are defined in (8). 

Let Qi be the pdf of at time point i and its estimator from a AFTER procedure be: 


-Ag 

Qi = 




K 


^i,j 






Theorem 2. If Conditions 3 and 4 hold, then for y- ® from a g-AFTER procedure, we have: 

io+n / io+n 


1 


- ^ ED{qi\\qf‘’) < inf 


n 


/ D 


Bi I, 


*=*0 + 1 


*=*0 + 1 


(t; 


where 


R = 





V 


If Condition 1 also holds, then 


n 


io+n 

E^ 

*=* 0+1 


l<j<J \ 11 

\ *=* 0+1 


a: 


\ 

/ 

{ruj - yij) 

of 


under normal errors; 

under double-exponential errors; 

under scaled t errors. 


+ R\ . 


In the above, C, Bi, B 2 and B 3 are constants depending on r, ^,2 and parameters in Condition 

I 


Remarks. 
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1. Theorem 2 provides a risk bound for more general situations compared to Theorem 
1. That is, as long as the the true random errors are from one of the three popular 
families, similar risk bounds hold. 


2. When strong evidence is shown that the errors are highly heavy-tailed, hi can be very 
small with only small degrees of freedom and the C 2 W^l. in G can be relatively large 
(relative to and The more information on the tails of the error distributions 

is available, the more efficient the allocation of the initial weights can be. 


3. Specially, when the true random errors have tails signihcantly heavier than normal and 
double-exponential, they could be assumed to be from a scaled Student’s t-distribution 
with unknown v and a (general) t-AFTER procedure is more reasonable. In this case, 

7^9 _ ]At 

^ 2 — 1,7 ^ 2 — 1 , 7 * 


Let g, = j-Jt and ‘ and > 0 for all j 

and k. Without assuming Condition 1 is satisfied, it follows for any n > 1: 


io+n 


n 


< inf 

f ^ 1 < T < 


*=*0 + 1 


i<i<J 


/ \og{l/wf ^) ^ Bi 

I n 


+ 


io+n 

*=*0 + 1 


jrii - yij) 

a? 


+ R* 


where is dehned the same as that in section 2.3 and 


Bo 


io+n 


R*= inf ^ y E 

l<k<K \ n 




+ Bs 


7=10+1 

If Condition 1 is also satished, then it follows: 


E -i^k I 


V 


n 


io+n 


- y, 


At\2 


<C inf 

T<'i<j\ n n rr. 




i<i<^ V 


7=7o + l * ^ i=io+l 

where C, Ri, B 2 and R 3 are the same as in Theorem 2. 


4 Simulations 

We consider two simulation scenarios, with candidate forecasters from linear regression mod¬ 
els and autoregressive {AR) models. Results from the linear regression models show improve¬ 
ments of the t- and g-AFTERs over the Li- and L 2 -AFTERS when the random errors have 
heavy tails. In the AR settings, the t- and g-AFTERs are compared to many other popular 
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combination methods in various situations, including cases that the forecast errors are with 
extremely symmetric/asymmetric heavy tails. We also compared the performances of the t- 
and gf-AFTERs to other combination methods on the linear regression models and similar 
results are found. Only representative results are given here. 

In this and the following sections, we have the following settings: 

• Use O = {1,3}. The t-AFTER is proposed mostly to be applied when the error 
terms exhibit very strong heavy-tailed behaviors. When the degrees of freedom of 
the Student’s f-distribution gets larger, the f-AFTER becomes similar to the Li- or L 2 - 
AFTER. Thus a choice of O with relatively small degrees of freedom in the ^f-AFTER 
should provide good enough adaption capability. In fact, other options for hi, such as 
n = (1, 3, 5, 8,15} were considered, and similar results were found. 

• Since it is usually the case that gf-AFTER is preferred when the users have no consistent 

and strong evidences to identify the distribution of the error terms from the three can¬ 
didate distribution families, we put equal initial weights to the candidate distributions. 
So Cl = 1, C 2 = 2, = ^/J and wfl = ^ are used in the gf-AFTER. Note 

that, for example, if there is clear and consistent evidence that the error distribution 
is more likely to be from the normal distribution family, then putting relatively large 
initial weights on the L 2 -AFTER procedure in a gf-AFTER can be more appropriate 
than using equal weights. 

• The Sij^kS are the sample median of the absolute forecast errors before time point i from 
the forecaster j divided by the theoretical median of the absolute value of a random 
variable with distribution 

4.1 Linear Regression Models 

4.1.1 Simulation Settings 

There are p predictors (Xi, • • • , Xp) available and the true model uses the hrst po predictors 
with coefficients /3 = {/3i, ■ ■ ■ , /3p^). That is, Y = The p candidate forecasters 

are generated from the following p models: Y = /3o + Xi/3i + e, Y = /3o + Yli=i ^i/^i + e, • • •, 
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Y = (5q + + e. We take p = 2po — 1 for this scenario. Other settings for p and po 

were also considered and they gave similar results. 

The p predictors are generated from a multivariate normal distribution with zero mean 
and covariance matrix S with sample size n = 125. For the entries in S, the diagonal 
elements are 1 and off-diagonal elements are 0.8. The forecasters are generated after the 90-th 
observation, and the combination is generated after the 5th forecasts. Various distributions 
for the random errors (e) are considered. Note that, we also tried other structures of E, 
including the ones with Sjj = 0.5l*“-^l and Ejj = I{i = j) VI < i,j < p- The results are 
similar. 

For each set of /?, we generate 200 sets of (Xi, • • ■ , Xp, Y) and on each of the 200 sets, 
we record the ^ ~ (Average Squared Estimation Error (ASEE hereafter)) of 

each combination method, where Pi is the forecast of Pi from this method. Note that, since 
this is a simulation study, the combined forecasts are compared with the conditional means 
(mds) instead of the observations {pis) to better compare the competing methods. For each 
competing method, the mean ASEE over the 200 data sets is recorded. 

We sample [3 for 200 times independently from a f/?7,i/[l,3] for each component with 
size Po, so 200 sets of mean ASEEs are recorded. In order to compare the performances of 
the four AFTER based methods, the L 2 -, Li--, t- and p-AFTERs, for each (3, the ratios of 
the mean ASEEs of the L 2 -, t- and p-AFTERs over the mean ASEE of the Li-AFTER is 
recorded. The summaries (means and their standard errors) of the 200 sets of ratios are 
presented. 

4.1.2 Results 

Three sets of results (po = 3, 5,10 respectively) are presented in Table 1 in this subsection. 
In this table, A2, At and Ag stand for the ratios of the mean ASEEs of the L 2 -, t- and g- 
AFTERs over those of the Li-AFTER. The information in the hrst and second rows indicate 
the distributions of e: with = 9 means e ~ kt^ with Var{kt^) = 9. The top numbers 
in rows 4-6, 8-10 and 12-14 are the mean of the 200 ratios. The numbers in the parentheses 
are the standard errors of the statistics above them. Rows 3, 7 and 11 tell the number 
of predictors used in the true models. DE stands for double-exponential with zero mean 
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hereafter. 


4.1.3 Summary 

From Table 1, in the linear regression setting, we see that the overall performances of the t- 
and ^f-AFTERs are relatively more robust than that of the Li- and L 2 -AFTERS. Specihcally: 

1. When the random errors have heavy tails, the t- and gf-AFTERs provide more accurate 
forecasts than the L 2 - and Li-AFTERs consistently. 

2. When the tails of the random errors distributions are not or only mildly heavy, say 
a normal or a scaled Student’s t-distribution with a large degrees of freedom, the g- 
AFTER is better than the t-AFTER in terms of forecast accuracy. 

3. The Li-AFTER outperforms the L 2 -AFTER when the random errors have heavy tails 
while L 2 -AFTER is more accurate than the Li-AFTER when the random errors are 
not heavy-tailed. 

4.2 AR Models 

4.2.1 Simulation Settings 

Let the true model be a AR{pq) process with random errors from certain distributions and 
the candidate forecasters be based on AR(1), AR{2), • • • , AR{p) (1 < po < p); respectively. 
For results on asymptotically optimal model selection for AR models, see, e.g., Ing (2007) 
and Ing et. al (2012). We here compare forecast combination methods. 

In this scenario, given p, po is randomly sampled from a Uniform distribution on {1, 2, • • • , p}. 
Given po, (3 in the true model is generated; given /9, 200 samples with size n = 125 from 
the true model are generated. On each data sample, the candidate forecasters are generated 
after the 90-th observation and the ASEE of the last 20 forecasts is recorded. Also, the 
combined forecasts are compared with the conditional means instead of the observations. 
For each /3, the mean ASEE of each combining method over the 200 samples is recorded and 
ratios of the mean ASEEs of other methods over that of the Li-AFTER are recorded. 

We replicate the generation of po’s (and /3’s) for 200 times and report the mean and its 
standard error of the 200 ratios for each combination method. 
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Only the results of p = 5 are presented (other choices, such as p = 8 and 10, provide 
similar results). 

4.2.2 Other Combination Methods 

Some other popular combination methods are included in this part and compared with 
the newly proposed methods. Simple average combination strategy (SA) uses the average 
of the candidate forecasts as the combined forecasts. The MD and TM strategies use 
the median and the trimmed mean (remove the largest and smallest before averaging) of 
candidate forecasts, respectively. The variance-covariance estimation based combination 
method (denoted as BG because it was hrst proposed by Bates & Granger (1969)) we use 
in this paper is the version in Hansen (2008). Also, a modihed BG method with a discount 
factor 0 < p < 1 is considered and the results of multiple p’s are presented. In the modihed 
BG, the estimate of the (conditional) variance of the forecast errors of a forecaster at any 
time point is the associated discounted mean squared forecast error with factor p. See, 
e.g. Stock & Watson (2006), for more details. Hereafter, for example, BGo,g denotes a BG 
method with p = 0.9. Two linear-regression based combination methods are also considered: 
one is the combination via ordinary linear regression {LR) and the other one is a constrained 
linear regression [GLR) combination. The constraints of the GLR are: all coefficients are 
non-negative and the sum of the coefficients is 1 (without intercept in the regressions). 

4.2.3 Results 

Tables 2 and 3 provide the summaries of the simulation results. In these two tables, A2, At, 
Ag, SA, MD, TM, BG, LR and GLR stand for the relative performances of these methods 
over that of the Li-AFTER. The other entries are dehned as in Table 1. 

Table 2 presents the results for the cases that the random errors are not (or only mildly) 
heavy-tailed, while Table 3 contains the results when the random errors have signihcant 
heavy tails. 
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4.2.4 Summary 

In the autoregression scenario, we see that the t- and ^f-AFTERs consistently outperform all 
other non-AFTER based combination methods in all the simulated situations (heavy tailed 
or not) and outperform the Li- and L 2 -AFTERS when the random errors are not normal. 
Below are some important details: 

1. In between the t- and ^f-AFTER, the latter is more robust since its performances under 
all scenarios are the best or close to the best. For the t-AFTER, its advantages over the 
Li- and L 2 -AFTERS are clear when the tails of the distributions of the random errors 
get heavier. 

2. In both Tables 2 and 3, the CLR is the most competitive method outside the AFTER 
family. It is because the constraints in the CLR make its weights relatively more stable 
and resistant to dramatic changes. The CLR gets more competitive when the random 
errors have heavier tails. 

3. The SA and TM are vulnerable to outliers, which hurts their overall performances. We 
can see this from both tables. 

4. In our settings, similar to many real application situations, since some of the candidate 
forecasters are highly correlated, using only the conditional variances to assign relative 
combining weights may not be enough. This explains why the BC and the discounted 
BCs are not quite competitive as seen in Tables 2 and 3. 

5 Real Data Example 

The M3-competition data contain 3003 hnancial/economical variables in which 1428 (N1402- 
N2829) have 18 forecasts and the rest have only 6 or 8 forecasts. For each of the 3003 
variables, notice that the forecasts are generated all at once (1-, 2-,- • • and up to 6, 8 
or 18-step ahead) by each forecaster. There were 24 candidate forecasters for each of the 
variables. We use the 1428 variables with 18 forecasts to conduct the simulation study 
because some combination methods need a few forecasts to train the parameters before 
achieving a reasonable level of reliability. 
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5.1 Data and Settings 

Let yii be the forecast of i/ii for no < i' < ni, then the mean squared forecast error (MSFE) is 
n^-np-i-i X]r=no(^* “ mean squared forecast errors to measure the prediction 

performances of the combination methods on each of the 1428 variables. For each variable, 
the MSFE of each of the other combination methods over the MSFE of the SA is reported. 

Specifically, using the same notations as those in section 4.2, the averaged relative per¬ 
formances (MSFE) of the MD, TM, BG, discounted BGA, A2, Al, At and Ag over the 
SA over the 1428 variables are presented. The main reason that we use the SA as the 
benchmark on this real data set is that the SA is one of the most popular combination 
methods with a great reputation in a broad range of applications. Since there are too many 
candidate forecasters compared to the forecast periods available, the two linear regression 
related combination methods discussed in section 4.2 are not considered here. 

For each of the variables with 18 forecast periods, the combination starts after the 6 -th 
forecasts and the MSFE of the last 9 forecasts of each method is recorded for performance 
comparisons. For each variable, the MSFE ratio of each method over that of the SA is 
reported. The summaries, mean (and its standard error), median, minimum, the Isf, 3rd 
quartiles (denoted as Qi and Q 3 , respectively) and maximum, of the 1428 ratios of each 
method are reported in Table 4. 

Also, the comparison on a subset of M3-competition data is provided. On this subset, 
the variables are considered to have high potentials to be heavy tailed. For each of the 1428 
variables with 18 forecast periods, there are some training data (about 70-128 months). We 
modeled the training data to find the ones with high potential to have heavy tailed errors. 
Specihcally, let yt be the observed value of a variable at time t and we ht each variable with 

a model as: yt = (5o + Yl]=i = j) + / 9 i 2 |/t-i H-h (iioyt-b using AIC in backward 

selection and the ones with kurtosis larger than 3 are considered to have heavy tails. There 
are 199 out of 1428 variables are selected. 

On the heavy tailed subset, we want to focus on the comparison between the g-AFTER 
and the non-AFTER methods because the comparison inside AFTER family is well addressed 
in simulation settings. The reason we choose the g-AFTER instead of the t-AFTER for further 
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comparison is because g-AFTER is practically more efficient since it performs well even the 
signal of heavy tails is not extremely strong. So, on this subset, the benchmark method is 
the g-AFTER and the results are reported in 5. 

5.2 Summary 

1. From Table 4, the overall performances of the AFTER based methods are better than 
the other popular combination methods considered. It also shows that the AFTERs 
can occasionally be signihcantly worse than the SA and other methods. 

2. From Table 4, it is worth noticing that the performances of the AFTERs can be a 
thousand times better while only about 10 times worse than that of SA. An examination 
reveals that for certain variables, such as N1837 and N2217, some candidate forecasters 
are consistently and significantly worse than others. In this situation, since the SA can 
not remove the extreme ‘disturbing’ ones before averaging, its performance is extremely 
poor. However, the AFTERs essentially ignore the ‘unreasonable’ candidate forecasts 
so they can be signihcantly better than the SA. 

3. Table 4 suggests that the t- and AFTERs have competitive performances in general 
while being more robust than others since their overall performances are outstanding 
and are still acceptable for the worst cases. 

4. From the comparison in Table 5, the gf-AFTER is signihcantly better than the non- 
AFTER methods when the random errors are suspected to have heavy tails. So the 
robustness of ^f-AFTER is supported by the M3-Competition data. 

6 Conclusions 

Forecast combination is an important tool to achieve better forecasting accuracy when multi¬ 
ple candidate forecasters are available. Although many popular forecast combination meth¬ 
ods do not necessarily exclude heavy tailed situations, little is found in the literature that 
examines the performances of forecast combination methods in such situations with theoret¬ 
ical characterizations. 
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In this paper, we propose combination methods designed for cases when forecast errors 
exhibit heavy tail behaviors that can be modeled by a scaled Student’s t-distribution and 
for the cases when the heaviness of the forecast errors is not easy to identify. The t-AFTER 
models the heavy-tailed random errors with scaled Student’s t-distributions with unknown 
(or known) degrees of freedom and scale parameters. A candidate pool of degrees of freedom 
are proposed to solve the estimation problem and the resulting t-AFTER works well as seen 
in simulation and real example analysis. 

However, in many cases the heaviness of the tails of the random errors is difficult to 
identify. Therefore, we design a combination process for general use and call it gf-AFTER. 
For these situations, instead of assuming a certain distribution form for the random errors, a 
set of possible heaviness of the tails are considered and the combination process automatically 
decides which ones are more reasonable by giving them high weights. The numerical results 
suggest the performance of the gf-AFTER is more robust than other popular combination 
methods because of its adaptive capability. The design of the AFTER provides a general 
idea: when there are multiple reasonable candidate distributions for the random errors, 
combining them in an AFTER scheme like the gf-AFTER for forecast combination should 
work well. 
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Appendix 

A.l 

In this subsection, some simple facts are given. They are used in A.2 of the appendix. 

• Fact 1: 1 — (1 — for a > 0, 0 < t < 1. Let /(t, a) = 1 — (1 — — at/ (1 — t), 

then f{t, a) < 0 since df /dt = a(l — — 1) < 0 and /(O, a) = 0. 

• Fact 2: log(a;) < a; — 1 for a; > 0. 
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• Fact 3: For any c > 0, B{a,b)/B{a,b + c) decreases as b increases. The proof is pure 

( xy 

1 H-T-- 

n{x + y + n) 

• Fact 4: + where Y ty conditional on z/. Let Z = Y a/(z/ + 2)/z/, 

then it is easy to show that E{1 + = 5(1/2, (z/ + 2)/2)/5(l/2, z//2) = z//(z/ + 1). 

• Fact 5: (s^ — l)/2 — log(s) < ^§^(1 — <s)^ if s > Sq > 0. Using fact 2 to show that 
- log(s) = log(l + (1 - s)/s) < (1 - s)/s. 


A.2 


Lemma 1 Let hy{x) be the density function of fi,, ^ > 0 and A > 0 be constants. Then for 
any 0 < sq < s, ^ < niin(z/, u') — 2 < u and |z/ — z/'| < A, we have 


V — V 


V 


[ hy{x) log yt —+ Cs 

J j 

where Ui, C 2 and U 3 are constants depending on sq, z/, v and A. 

Proof: After a proper reorganization, we have 

r., hy{X) , ,, 1, z/' , 

^ lu = log(5) + - log - + log 

s^^'\ s ) ^ ^ -°(2’ 2 ) 

+ ^ + 2 / )- 3 —log- 

\ 2 ' ' 2 V 

• Let z/* = niin(z/, z/') and using the Facts 1, 2 and 3, then; 

1 v' 


log^^iii) < 


< 


/ - (1 - 

5(1,1)^ 5(1,1) “ i?(if) 

_ I,, _ ,,-1 B(|, ) _ 1 ^ - 5(1, B(i, 

2 B(i|) 


1 2 
z/— z/'l 1 5(1, |) |z/ —z/'l 






< 


2 - 1 B(i, s±2; 

I z/ — z/' I z/ + A 


1 z ^+2 \ 


2 ’ 2 > 

5(1,1) ^ |z 2 -z 2 '|^+A 5(1,1) 


z/ z/*-l5(l,^ 


z/ + l5(l,^) 


u Z/ + 1 


Using Fact 2 in A.l, it follows: 1 log — < 


iidnE < liiLzid 


— 2 — 2 z/ 
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• It is easy to show that: 


E { log(s) + log(l + )-^ log(l + — 


= E I log(s) - (1 + v') log(s) + 


log 


2 

s^ + 


1 + ^/ 


1 + ^ 


V — V 


log(l + XVz/) 


< —v' log(s) + E 
2 + So , 




1 + X2/Z/ 


+ X \u — z/|/z/ 


z/ + 3 


< (2 + - sf + 


\v - u 


V 


2so u_ + 2 

where Cg is a constant depending on Sq, v and A. 

The proof can be completed by combining these steps. 

Note that if v is known, then v = v'. Then, 

hy{X) 2 + so^ ,^2 I lj.2 


Lemma 2 Let h{x) be the density function of a double-exponential distribution with 
/i = 0 and d = 1, then for Sq > 0 and s > Sq it follows: 

/ 1^^^ < <^ 4(1 - 

where C 4 and C 5 are constants depending only on sq. 

Proof: since h(|/) = Aexp(—1|/|) and exp(—x) < 1 — a; -f- ^ for a; > 0, then 


E log = log(s) + e(^ 


\Y-t\ 




< (s - 1) + 


l + tV2 , tV .. , t 


s 

2 1 


1 = — + (1 - s)2A < + _(1 - s)2. 

2s ^ s “ 2so So ^ ^ 


Lemma 3 Let h{y) be the density function of a standard normal distribution, then 
for So > 0 and s > so it follows: 
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where Cg and Cj are constants depending only on sq. 
Proof: using Fact 2, 


h{y) , . / X 1+^^ 

E log YTTY^^^y = + 




= —r + 

2s2 ^ 


2s2 
2s + l 
2s2 


1 2 , / \ 1 
— = ^t +log(s) + - 


1 2 / 

- + (» 


\ 9 1 9 2So “1“ 1 

IP < _+2 I__ 

^ - 2s2 ^ 2s2 


2s2 

(.-If. 


1 ) + 


1-.2 

2.2 


A.3 


In this subsection, we prove Theorem 1. 

Conditional on the information available until time point i, it is assumed that ^ 

where is the conditional scale parameter at time i. Let Si^ be the estimator of .* from the 
j-th forecaster. 

Let /" = nip. (“) 1" = E'i. tt, ntpi P* (^). where ft(.) is the 

density function of and tt^ is the initial combining weight of the j-th forecaster. So, g” is 
the estimator of /"■. 

Then, for any 1 < j' < J, 

n io+n 1 uf yi-m.i \ 
i=iQ + lsi\ Si ) 


^og{r/q^) < log 




1 Lh{y^^) 

log —+ V log ^ ’ 

^ 1=10+1 +e/'H Si,, ) 


Conditional on all the information before time point i, 


Ei log ■ 


1 u ( Yi-mj \ 

Sj \ Sj ) 

Sa At \ Sa a! / 


. \u{yi-Eni\ 

h{ —^jlog -^-TYz^dyi 




= / h{x) log■ 


^i,j' ^ ^i,j’ ’ 

h{x) 


. ^, hi 






dx 


By the Lemma 1 in A.2, 


B. log fhPP) < {y„. -mf + sjhr - 


Sa a! \ Sa a/ / 


2.2 


where Bi = z/^. So, 

h 2so ’ 


io+n 


io+n 


- ED{qi\\qf^)< ini 

n ^—<■ 1<J<J 


(jjij - m. 


n 


i=jo+l 



i=io + l 


jj io+n 

•' +Eiy E- 

n ^ 

^=^0 + l 


^i) 
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From the Theorem 1 of Yang (2004), there exists a constant C depending on the param¬ 
eters in Conditions 1 and 2 ', such that, 




ED{q,\\q{“) > 

O (7 A 


Therefore, 


'log^ 


7 io+n /« \2 T-> io+n 

w. n‘) ' 


+ — ^ 

n ^ 


{xji^j -rriif B 3 


+ — ^ 
rt 


ySi,j Si) 


- V ^ < C inf 

n af i<i<J \ n n ^^ cr; n ^^ s" 

2=20 + 1 ^ \ 2=io + l ^ 2 = 20 + 1 

where B 2 is a function of z/ and -83 is deducted the same as B^ but under Condition 2' instead 
of Condition 2. 


A.4 

Essential part of the proof of Theorem 2 is provided in this subsection. We only provide the 
steps of the proof when the random errors are scaled Student’s t-distributed since proof of 
other situations are similar. 

Let Sij^k be the estimator of s* from the j-th forecaster assuming Vk is the true degrees 
of freedom. If Condition 4 holds, then obviously 

K J io+n 


k=l j=l 


i=io+l 


9 " > E E n +'‘4 


So, for any j* and fc*. 


fn 

log ^ < log 

qn 


n io+n 1 u( yi-mi \ 
i=io+l Si\ Si ) 


C2wpu,/G ns 


_ Vi-Vij* 

*■*=*0+1 Sij* "'’'k* y Sij*^f,* 

Similarly, by the Lemma 1 in A.2, 

1 Z, t yi-rrii 


= log 


Si,j,k 


G 


io+n 


i=io+l 


E '“sA 


Si \ Si / 


h(yjz]E2i) 

\ Sa a* ]c* ' 


^i,j* ,k* ^ 


.™'') {yii* — T+iiY (si i* k* — SiY it'fc —^1 

Eilog — ^ / —< Bi ^ + .62 0 - — + B 3 \^ ' 

^ 1 k( ^i-yid* \ ~ ' ■^1 

\ Sa a* u* / 


at 

^i,j* ,k* ' 

The rest of the proof is similar to that of Theorem 1. 
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Table 1: Simulation Results on the Linear Regression Models 



h 

DE 

tio 

normal 


a^ = l 

= 9 

a^ = l 

= 9 

a^ = l 

= 9 

a^ = l 

= 9 





Po -- 

= 3 




A2 

1.302 

(0.009) 

1.043 

(0.003) 

1.116 

(0.004) 

1.028 

(0.001) 

0.983 

(0.003) 

0.958 

(0.001) 

0.926 

(0.002) 

0.931 

(0.001) 

At 

0.943 

(0.002) 

0.980 

(0.001) 

0.983 

(0.001) 

0.995 

(0.001) 

0.941 

(0.003) 

0.955 

(0.001) 

0.932 

(0.001) 

0.942 

(0.001) 

Ag 

0.944 

(0.002) 

0.967 

(0.001) 

0.974 

(0.001) 

0.977 

(0.001) 

0.940 

(0.001) 

0.950 

(0.001) 

0.926 

(0.001) 

0.938 

(0.001) 





Po -- 

= 5 




A2 

1.257 

(0.008) 

1.066 

(0.004) 

1.088 

(0.003) 

1.026 

(0.001) 

0.980 

(0.002) 

0.955 

(0.001) 

0.937 

(0.002) 

0.927 

(0.001) 

At 

0.950 

(0.002) 

0.967 

(0.001) 

0.976 

(0.001) 

0.982 

(0.001) 

0.951 

(0.001) 

0.950 

(0.001) 

0.943 

(0.001) 

0.938 

(0.001) 

Ag 

0.951 

(0.001) 

0.958 

(0.001) 

0.971 

(0.001) 

0.970 

(0.001) 

0.949 

(0.001) 

0.944 

(0.001) 

0.939 

(0.001) 

0.933 

(0.001) 





Po = 

= 10 




A2 

1.166 

(0.006) 

1.056 

(0.003) 

1.035 

(0.002) 

0.998 

(0.001) 

0.968 

(0.002) 

0.949 

(0.001) 

0.946 

(0.001) 

0.929 

(0.001) 

At 

0.950 

(0.002) 

0.957 

(0.001) 

0.964 

(0.001) 

0.965 

(0.001) 

0.949 

(0.001) 

0.946 

(0.001) 

0.948 

(0.001) 

0.939 

(0.001) 

Ag 

0.945 

(0.001) 

0.949 

(0.001) 

0.961 

(0.001) 

0.955 

(0.001) 

0.944 

(0.001) 

0.939 

(0.001) 

0.942 

(0.001) 

0.933 

(0.001) 




























































Table 2: Simulation Results on the AR Models with p = 5 (not or only mildly heavy tailed) 



normal 

tio 

DE 


a^ = l 

a^ = A 

= 9 

a2 = l 

^2 = 4 

= 9 

a^ = l 

a^ = A 

= 9 

A2 

0.941 

(0.004) 

0.940 

(0.004) 

0.940 

(0.004) 

0.972 

(0.004) 

0.972 

(0.003) 

0.971 

(0.003) 

1.030 

(0.004) 

1.032 

(0.003) 

1.033 

(0.004) 

At 

0.954 

(0.003) 

0.953 

(0.003) 

0.954 

(0.003) 

0.961 

(0.002) 

0.962 

(0.003) 

0.962 

(0.003) 

0.997 

(0.001) 

1.001 

(0.001) 

0.995 

(0.001) 

Ag 

0.948 

(0.003) 

0.947 

(0.004) 

0.948 

(0.004) 

0.957 

(0.003) 

0.959 

(0.003) 

0.958 

(0.003) 

0.978 

(0.002) 

0.983 

(0.001) 

0.976 

(0.002) 

SA 

2.892 

(0.268) 

2.484 

(0.166) 

2.408 

(0.189) 

2.372 

(0.167) 

2.297 

(0.174) 

2.070 

(0.127) 

2.278 

(0.148) 

2.176 

(0.151) 

2.483 

(0.148) 

MD 

1.681 

(0.137) 

2.025 

(0.191) 

1.824 

(0.187) 

1.884 

(0.243) 

1.874 

(0.197) 

1.421 

(0.076) 

1.740 

(0.137) 

1.602 

(0.144) 

1.943 

(0.168) 

TM 

1.805 

(0.121) 

1.946 

(0.144) 

1.754 

(0.134) 

1.838 

(0.156) 

1.705 

(0.138) 

1.469 

(0.066) 

1.723 

(0.109) 

1.571 

(0.093) 

1.885 

(0.120) 

BG 

1.441 

(0.047) 

1.462 

(0.051) 

1.389 

(0.047) 

1.425 

(0.042) 

1.364 

(0.040) 

1.321 

(0.032) 

1.431 

(0.046) 

1.357 

(0.035) 

1.500 

(0.045) 

BGo,q5 

1.432 

(0.047) 

1.453 

(0.050) 

1.381 

(0.047) 

1.417 

(0.042) 

1.358 

(0.040) 

1.315 

(0.032) 

1.427 

(0.045) 

1.353 

(0.035) 

1.495 

(0.045) 

BGo.9 

1.429 

(0.047) 

1.449 

(0.049) 

1.378 

(0.047) 

1.414 

(0.042) 

1.355 

(0.039) 

1.313 

(0.032) 

1.425 

(0.045) 

1.352 

(0.035) 

1.492 

(0.045) 

BGqs 

1.433 

(0.047) 

1.452 

(0.050) 

1.382 

(0.047) 

1.417 

(0.042) 

1.357 

(0.040) 

1.315 

(0.032) 

1.427 

(0.045) 

1.353 

(0.035) 

1.491 

(0.044) 

BGqi 

1.447 

(0.048) 

1.464 

(0.051) 

1.394 

(0.049) 

1.428 

(0.043) 

1.366 

(0.040) 

1.322 

(0.033) 

1.432 

(0.046) 

1.357 

(0.036) 

1.495 

(0.045) 

LR 

7.956 

(0.346) 

8.355 

(0.339) 

8.491 

(0.342) 

8.856 

(0.387) 

10.210 

(1.032) 

9.138 

(0.363) 

11.110 

(0.504) 

11.240 

(0.509) 

10.040 

(0.513) 

GLR 

1.036 

(0.011) 

1.024 

(0.013) 

1.036 

(0.012) 

1.032 

(0.011) 

1.036 

(0.010) 

1.042 

(0.011) 

1.072 

(0.011) 

1.070 

(0.011) 

1.045 

(0.013) 
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Table 3; Simulation Results on the AR Models with p = 5 (heavy tailed) 




log-normal 


a‘^ = l 

^2 = 4 

= 9 

a = 0.25 

a = 0.5 

a = 1 

A2 

1.058 

(0.009) 

1.056 

(0.008) 

1.053 

(0.008) 

0.964 

(0.003) 

1.024 

(0.004) 

1.051 

(0.010) 

At 

0.955 

(0.006) 

0.947 

(0.006) 

0.961 

(0.006) 

0.951 

(0.003) 

0.940 

(0.004) 

0.921 

(0.008) 

Ag 

0.950 

(0.006) 

0.943 

(0.006) 

0.957 

(0.006) 

0.950 

(0.003) 

0.946 

(0.004) 

0.926 

(0.008) 

SA 

2.047 

(0.107) 

1.889 

(0.098) 

1.931 

(0.139) 

2.253 

(0.173) 

2.143 

(0.115) 

1.730 

(0.087) 

MD 

1.692 

(0.135) 

1.396 

(0.066) 

1.657 

(0.182) 

1.517 

(0.097) 

1.441 

(0.085) 

1.370 

(0.078) 

TM 

1.625 

(0.091) 

1.438 

(0.060) 

1.508 

(0.112) 

1.559 

(0.086) 

1.555 

(0.080) 

1.404 

(0.057) 

BG 

1.369 

(0.034) 

1.307 

(0.025) 

1.286 

(0.033) 

1.329 

(0.039) 

1.374 

(0.038) 

1.278 

(0.025) 

BGo,q5 

1.365 

(0.033) 

1.303 

(0.025) 

1.282 

(0.033) 

1.322 

(0.038) 

1.370 

(0.038) 

1.275 

(0.025) 

BGq^ 

1.360 

(0.033) 

1.299 

(0.025) 

1.277 

(0.032) 

1.319 

(0.037) 

1.367 

(0.037) 

1.271 

(0.024) 

BGqs 

1.352 

(0.032) 

1.290 

(0.024) 

1.269 

(0.030) 

1.320 

(0.038) 

1.366 

(0.037) 

1.259 

(0.023) 

BGqj 

1.345 

(0.032) 

1.284 

(0.023) 

1.263 

(0.030) 

1.327 

(0.039) 

1.368 

(0.037) 

1.248 

(0.023) 

LR 

95.280 

(60.670) 

38.290 

(7.566) 

46.220 

(9.192) 

9.316 

(0.375) 

13.180 

(0.891) 

174.000 

(56.286) 

GLR 

1.014 

(0.010) 

1.007 

(0.010) 

1.016 

(0.010) 

1.046 

(0.011) 

1.032 

(0.011) 

0.974 

(0.010) 


Note; For the columns of ‘log-normal’, a’s are the scale parameters. 
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Table 4: Results on the 1428 Variables of the M3-Competition Data 



mean 

se 

median 

min 

Qi 

Qs 

max 

MD 

1.050 

0.010 

1.022 

0.002 

0.910 

1.143 

5.341 

TM 

0.990 

0.004 

1.000 

0.002 

0.974 

1.023 

2.437 

BG 

0.784 

0.010 

0.838 

0.001 

0.596 

0.973 

5.227 

BGo,g5 

0.775 

0.010 

0.832 

0.001 

0.582 

0.969 

7.715 

-BGo.9 

0.768 

0.012 

0.825 

0.001 

0.564 

0.966 

11.45 

BGqs 

0.758 

0.019 

0.806 

0.001 

0.529 

0.960 

24.08 

BGo,7 

0.757 

0.031 

0.793 

0.001 

0.503 

0.956 

43.19 

R1 

0.708 

0.016 

0.649 

0.001 

0.307 

0.994 

11.50 

A2 

0.697 

0.017 

0.639 

0.001 

0.309 

0.979 

13.32 

At 

0.708 

0.015 

0.646 

0.001 

0.312 

1.003 

8.632 

Ag 

0.696 

0.014 

0.645 

0.001 

0.308 

0.987 

7.710 
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Table 5: Results on the Heavy-tailed Subset 



mean 

se 

median 

min 

Qi 

Qs 

max 


7.738 

1.695 

2.259 

0.131 

1.311 

5.244 

82.734 

MD 

8.088 

2.005 

1.912 

0.222 

1.162 

4.974 

120.428 

TM 

7.607 

1.664 

2.299 

0.129 

1.267 

5.175 

78.481 


2.017 

0.217 

1.431 

0.241 

0.965 

2.472 

12.551 

BGo.g 

1.846 

0.182 

1.337 

0.208 

0.958 

2.444 

10.383 

BGo.s 

1.656 

0.150 

1.340 

0.179 

0.851 

2.074 

8.577 

BGq/1 

1.536 

0.141 

1.256 

0.158 

0.813 

1.673 

7.746 
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